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HUMAN TRANSCRIPTOMES 

This invention was made with government support under CA57345, CA62924, and 
CA43460 awarded by the National Institutes of Health. The government has certain rights 
in the invention, 

BACKGROUND OF THE INVENTION 

The characteristics of an organism are largely determined by the genes expressed 
within its cells and tissues. These expressed genes can be represented by transcriptomes 
that convey the identity and expression level of each expressed gene in a defined population 
of cells (1, 2). Although the entire sequence of the human genome will be elucidated in the 
near future (3), little is known about the many transcriptomes present in the human 
organism. Basic questions regarding the set of genes expressed in a given cell type, the 
distribution of expressed genes, and how these compare to genes expressed in other cell 
types, have remained largely unanswered. 

General properties of gene expression patterns in eukaryotic cells were determined 
many years ago by RNA-cDNA reassociation kinetics (4), but these studies did not provide 
much information about the identities of the expressed genes within each expression class. 
Technological constraints have limited other analyses of gene expression to one or few 
genes at a time (5-9) or were non-quantitative (10, 11). Serial analysis of gene expression 
(SAGE) (12), one of several recently developed gene expression methods, has permitted 
the quantitative analysis of transcriptomes in the yeast Saccharomyces cereviseae (1, 13). 
This effort identified the expression of known and previously unrecognized genes in S. 

1 
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cereviseae (1, 14) and demonstrated that genome-wide expression analyses were 
practicable in eukaryotes. 

Thus, there is a need in the art for the identification of transcriptomes which represent 
gene expression in particular cell types or under particular physiological conditions in 
eukaryotes, particularly in humans. 



SUMMARY O FTHF INVENTION 

It is an object of the present invention to provide such transcriptomes, individual 
polynucleotides, and methods of using the polynucleotides to identify particular cell types, 
screen for useful drugs, reduce cancer-specific gene expression, standardize gene 
expression, and restore function to a diseased cell or tissue. These and other objects of the 
invention are provided by one or more of the embodiments described below. 

One embodiment of the invention is a method of identifying a cell as either a colon 
epithelial cell, a brain cell, a keratinocyte, a breast epithelial cell, a lung epithelial cell, a 
melanocyte, a prostate cell, or a kidney epithelial cell. Expression in a test cell of a gene 
product of at least one gene is determined. The at least one gene comprises a sequence 
selected from at least one of the following groups: 

(a) the sequences shown in SEQ ID NOS:2, 5-18, 20-84, and 85; 

(b) the sequences shown in SEQ ED NOS:87-96, 98, 100-103, 105, 107-110, 112-129, 
131-150, and 151; 

(c) the sequences shown in SEQ ID NOS:152-154 and 155; 

(d) the sequences shown in SEQ ID NOS.156-159 and 160; 

(e) the sequences shown in SEQ ID NOS:161-166 and 167; 

(0 the sequences shown in SEQ ID NOS:168, 170, 172-177, 179-188, 190-207, and 

208; 

(g) the sequences shown in SEQ ED NOS:209 and 210; and 

(h) the sequences shown in SEQ ID NOS:21 1 -224 and 225. 

Expression of a gene product of at least one gene comprising a sequence shown in (a) 
identifies the test cell as a colon epithelial cell. Expression of a gene product of at least one 
gene comprising a sequence shown in (b) identifies the test cell as a brain cell. Expression 
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of a gene product of at least one gene comprising a sequence shown in (c) identifies the test 
cell as a keratinocyte. Expression of a gene product of at least one gene comprising a 
sequence shown in (d) identifies the test cell as a breast epithelial cell. Expression of a 
gene product of at least one gene comprising a sequence shown in (e) identifies the test cell 
as a lung epithelial cell. Expression of a gene product of at least one gene comprising a 
sequence shown in (f) identifies the test cell as a melanocyte. Expression of a gene product 
of at least one gene comprising a sequence shown in (g) identifies the test cell as a prostate 
cell. Expression of a gene product of at least one gene comprising a sequence shown in (h) 
identifies the test cell as a kidney epithelial cell. 

Another embodiment of the invention is an isolated polynucleotide comprising a 
sequence selected from the group consisting of SEQ ID NOS:2, 5, 6, 8, 10, 12, 13, 15, 17, 
18, 21 , 24-26, 28, 30, 31, 34-36, 38, 40, 47-51, 53-57, 59-62, 65-69, 71-76, 78, 80-84, 98, 
103, 113, 115, 122, 129, 132, 134, 135, 140, 144, 149, 150, 153-168, 174-176, 182, 185, 
186, 188, 190, 200, 201, 205-213, 216-224, 237, 239, 257, 263, 485, 487, 495, 499, 514, 
586, 686, 751, 835, 844, 878, 910, 925, 932, 951, 1000, 1005, 1070, 1122, 1 130, 1170, 
1173, 1187, 1189, 1200, 1213, 1220, 1237, 1257, 1264, 1273, 1293, 1300, 1320, 1367, 
1371, 1401, 1403, 1404, 1406, 1418, and 1419. 

Still another embodiment of the invention is a solid support comprising at least one 
polynucleotide. The polynucleotide comprises a sequence selected from at least one of the 
following groups: 

(a) the sequences shown in SEQ ID NOS:2, 5, 6, 8, 10, 12, 13, 15, 17, 18, 21, 24-26, 
28, 30, 31, 34-36, 38, 40, 47-51, 53-57, 59-62, 65-69, 71-76, 78, 80-83, and 84; 

(b) the sequences shown in SEQIDNOS:98, 103, 113, 115, 122, 129, 132, 134, 135, 
140,144,149, and 150; 

(c) the sequences shown in SEQ ID NOS:153-154 and 155; 

(d) the sequences shown in SEQ ED NOS:156-157 and 160; 

(e) the sequences shown in SEQ ID NOS:161-166 and 167; 

(f) the sequences shown in SEQ ID NOS: 168, 174-176, 182, 185, 186, 188, 190, 200, 
201, 205-207 and 208; 

(g) the sequences shown in SEQ ID NOS:209 and 210; 
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(h) the sequences shown in SEQ ID NOS:21 1-213, 216-223, and 224; 

(i) the sequences shown in SEQ ID NOS:237, 239, 257, and 263; or 

0) the sequences shown in SEQ ID NOS.485, 487, 495, 499, 5 1 4, 586, 686, 75 1 , 835, 
844, 878, 910, 925, 932, 951, 1000, 1005, 1070, 1 122, 1 130, 1 170, 1 173, 1 187, 1 189, 1200, 
1213, 1220, 1237, 1257, 1264, 1273, 1293, 1300, 1320, 1367, 1371, 1401, 1403, 1404, 
1406, 1418, and 1419. 

Even another embodiment of the invention is a method of identifying a test cell as 
a cancer cell. Expression in a test cell of a gene product of at least one gene is determined. 
The at least one gene comprises a sequence selected from the group consisting of SEQ ID 
NOS:228, 230-257, 259-260, and 262-265. An increase in expression of at least two-fold 
relative to expression of the at least one gene in a normal cell identifies the test cell as a 
cancer cell. 

Yet another embodiment of the invention is a method of reducing expression of a 
cancer-specific gene in a human cell. A reagent which specifically binds to an expression 
product of a cancer-specific gene is administered to the cell. The cancer-specific gene 
comprises a sequence selected from the group consisting of SEQ ED NOS:228, 230-257, 
259-260, and 262-265. Expression of the cancer-specific gene is thereby reduced relative 
to expression of the cancer-specific gene in the absence of the reagent. 

Even another embodiment of the invention is a method for comparing expression of 
a gene in a test sample to expression of a gene in a standard sample. A first ratio and a 
second ratio are determined. The first ratio is an amount of an expression product of a test 
gene in a test sample to an amount of an expression product of at least one gene comprising 
a sequence selected from the group consisting of SEQ ID NOS:266-375, 377-652, 654-796, 
and 798-1448 in the test sample. The second ratio is an amount of an expression product 
of the test gene in a standard sample to an amount of an expression product of the at least 
one gene in the standard sample. The first and second ratios are compared. A difference 
between the first and second ratios indicates a difference in the amount of the expression 
product of the test gene in the test sample. 

Still another embodiment of the invention is a method of screening candidate anti- 
cancer drugs. A cancer cell is contacted with a test compound. Expression of a gene 
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product of at least one gene in the cancer cell is measured. The at least one gene comprises 
a sequence selected from the group consisting of SEQ ED NOS:228, 230-257, 259, 260, 
Itl-lte, and 265. A decrease in expression of the gene product in the presence of a test 
compound relative to expression of the gene product in the absence of the test compound 
identifies the test compound as a potential anti-cancer drug. 

Still another embodiment of the invention is a method of screening test compounds 
for the ability to increase an organ or cell ftinction. A selected from the group consisting 
of a colon epithelial cell, a brain cell, a keratinocyte, a breast epithelial cell, a lung 
epithelial cell, a melanocyte, a prostate cell, and a kidney cell is contacted with a test 
compound. Expression in the cell of a gene product of at least one gene is measured. The 
gene comprises a sequence selected from at least one of the following groups: 

(a) the sequences shown in SEQ ID NOS:2, 5-18, 20-84, and 85; 

(b) the sequences shown in SEQ IDNOS:87-96, 98, 100-103, 105, 107-1 10, 1 12-129, 
131-150, and 151; 

(c) the sequences shown in SEQ ID NOS:152-154 and 155; 

(d) the sequences shown in SEQ ID NOS:156-159 and 160; 

(e) the sequences shown in SEQ ED NOS:161-166 and 167; 

(f) the sequences shown in SEQ ID NOS:168, 170, 172-177, 179-188, 190-207 and 

208; 

(g) the sequences shown in SEQ ED NOS.209 and 210; and 

(h) the sequences shown in SEQ ID NOS:21 1-224 and 225. 

An increase in expression of a gene product of at least one gene comprising a sequence 
shown in (a) identifies the test compound as a potential drug for increasing a function of 
a colon cell. An increase in expression of a gene product of at least one gene comprising 
a sequence shown in (b) identifies the test compound as a potential drug for increasing a 
function of a brain cell. An increase in expression of a gene product of at least one gene 
comprising a sequence shown in (c) identifies the test compound as a potential drug for 
increasing a function of a skin cell. An increase in expression of a gene product of at least 
one gene comprising a sequence shown in (d) identifies the test compound as a potential 
drug for increasing a function of a breast cell. An increase in expression of a gene product 
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of at least one gene comprising a sequence shown in (e) identifies the test compound as a 
potential drug for increasing a function of a lung cell. An increase in expression of a gene 
product of at least one gene comprising a sequence shown in (0 identifies the test 
compound as a potential drug for increasing a function of a melanocyte. An increase in 
expression of a gene product of at least one gene comprising a sequence shown in (g) 
identifies the test compound as a potential drug for increasing a function of a prostate cell. 
An increase in expression of a gene product of at least one gene comprising a sequence 
shown in (h) identifies the test compound as a potential drug for increasing a function of 
a kidney cell. 

Yet another embodiment of the invention is a method to restore function to a diseased 
tissue. A gene is delivered to a diseased cell selected from the group consisting of a colon 
epithelial cell, a brain cell, a keratinocyte, a breast epithelial cell, a lung epithelial cell, a 
melanocyte, a prostate cell, and a kidney cell. The gene comprises a nucleotide sequence 
selected from at least one of the following groups: 

(a) the sequences shown in SEQ ID NOS:2, 5-18, 20-84, and 85; 

(b) the sequences shown in SEQ ID NOS:87-96, 98, 100-103, 105, 107-110, 112-129, 
131-150, and 151; 

(c) the sequences shown in SEQ ID NOS:152-154 and 155; 

(d) the sequences shown in SEQ ED NOS: 156-1 59 and 160; 

(e) the sequences shown in SEQ ID NOS: 161-166 and 167; 

(f) the sequences shown in SEQ ID NOS:168, 170, 172-177, 179-188, 190-207, and 

208; 

(g) the sequences shown in SEQ ID NOS:209 and 210; and 

(h) the sequences shown in SEQ ID NOS:21 1-224 and 225. 

Expression of the gene in the diseased cell is less than expression of the gene in a 
corresponding cell which is normal. If the diseased cell is a colon epithelial cell, then the 
nucleotide sequence is selected from (a). If the diseased cell is a brain cell, then the 
nucleotide sequence is selected from (b). If the diseased cell is a keratinocyte, then the 
nucleotide sequence is selected from (c). If the diseased cell is a breast epithelial cell, then 
the nucleotide sequence is selected from (d). If the diseased cell is a lung epithelial cell, 
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then the nucleotide sequence is selected from (e). If the diseased cell is a melanocyte, then 
the nucleotide sequence is selected from (f). If the diseased cell is a prostate cell, then the 
nucleotide sequence is selected from (g). If the diseased cell is a kidney cell, then the 
nucleotide sequence is selected from (h). 

Thus, the invention provides transcriptomes, polynucleotides, and methods of 
identifying particular cell types, reducing cancer-specific gene expression, identifying 
cancer cells, standardizing gene expression, screening test compounds for the ability to 
increase an organ or a cell function, and restoring function to a diseased tissue. 

RRIFF DESCRIPTION OF THF DRAWINGS 

FIG. 1. Sampling of gene expression in colon cancer cells. Analysis of transcripts 
at increasing increments of transcript tags indicates that the fraction of new transcripts 
identified approaches 0 at approximately 650,000 total tags. 

FIG. 2. Colon cancer cell Rot curve. 

FIGS. 3A-3C. Gene expression in different tissues. FIG. 3A. Fold reduction or 
induction of unique transcripts for each of the comparisons analyzed. The source of the 
transcripts included in each comparison are displayed in FIG. 3C. The relative expression 
of each transcript was determined by dividing the number of transcript tags in each 
comparison in the order displayed in FIG. 3C. To avoid division by 0, we used a tag value 
of 1 for any tag that was not detectable in one of the samples. We then rounded these ratios 
to the nearest integer; their distribution is plotted on the X axis. The number of transcripts 
displaying each ratio is plotted on the Y axis. Each comparison is represented by a specific 
color (see below or FIG. 3C). FIG. 3B. Expression of transcripts for each comparison, 
where values on X and Y axes represent the observed transcript tag abundances in each of 
the two compared sets. Light Blue symbols: DLD1 in different physiologic conditions; 
Yellow symbols: DLD1 cells (X axis) versus HCT1 16 cells (Y axis); Red symbols: colon 
cancer cells (X axis) versus normal brain (Y axis); and Dark Blue symbols: colon cancer 
cells (X axis) versus hemangiopericytoma (Y axis). FIG.3C. Fraction of transcripts with 
dramatically altered expression. For each comparison, Expression Change denotes the 
number of transcripts induced or reduced 10 fold, and (%) denotes the number of altered 
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transcripts divided by the number of unique transcripts in each case. Differences between 
expression changes were evaluated using the chi squared test, where the expected 
expression changes were assumed to be the average expression change for any two 
comparisons. 

TABLE LEGENDS 

Table 1. Table of tissues and transcript tags analyzed. "Tissues" represents the 
source of the RNA analyzed, "Libraries" indicates the number of SAGE libraries analyzed, 
"Total Transcripts" is the total number of transcripts analyzed from each tissue, and 
"Unique Transcripts" denotes the number of unique transcripts observed in each tissue. 

Table 2. Table of transcript abundance. "Copies/cell" denotes the category of 
expression level analyzed in transcript copies per cell, <c Unique Transcripts" represents the 
number of unique transcripts observed and those matching GenBank genes or ESTs, and 
"Mass fraction mRNA" represents the fraction of mRNA molecules contained in each 
expression category. 

Table 3. Table showing tissue-specific transcripts. The number in parentheses 
adjacent to the tissue type indicates the percent of transcripts exclusively expressed in a 
given tissue at 10 copies per cell. "Transcript tag" denotes the 10 bp tag adjacent to 4 bp 
Main anchoring enzyme site, "Copies/cell" denotes the transcript copies per cell expressed, 
and "UniGene Description" provides a functional description of each matching UniGene 
cluster (from UniGene Build No. 67). As UniGene cluster numbers change over time, the 
most recent cluster assignment for each tag can be obtained individually at 
http://www.ncbi.nIm.nih.gov/SAGE/SAGEtag.cgi (Lai et al % "A public database for gene 
expression in human cancers," Cancer Research, in press) or for the entire table at 
http://www.sagenet.org/transcriptome. 

Table 4. Table showing ubiquitously expressed genes. "Copies/cell" denotes the 
average expression level of each transcript from all tissues examined, "Range" represents 
the range in expression for each transcript tag among all tissues analyzed in copies per cell, 
and "Range/ Avg" is the ratio of the range to the average expression level and provides a 
measure of uniformity of expression. Other table columns are the same as in Table 5. The 
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entire table of uniformly expressed transcripts also is available at 
http://www.sagenet.org/transcriptome. 

Table 5. Table showing transcripts uniformly elevated in human cancers. 
Transcripts expressed at 3 copies/cell whose expression is at least 2-fold higher in each 
cancer compared to its corresponding normal tissue. CC, colon cancer; BC, brain cancer, 
BrC, breast cancer; LC, lung cancer; M, melanoma; NC, normal colon epithelium; NB, 
normal brain; NBr, normal breast epithelium; NL, normal lung epithelium; NM, normal 
melanocytes. "Avg T/N" is the average ratio of expression in tumor tissue divided by 
normal tissue (for the purpose of obtaining this ratio, expression values of 0 are converted 
to 0.5). Other table columns are the same as in Table 5. 

Table 6. Table showing transcripts expressed in colon cancer cells at a level of at 
least 500 copies per cell. 

Table 7. Table showing transcripts expressed at a level of at least 500 copies per 

cell. 



DETAILED DESCRIPTION OF THE INVENTION 

It is a discovery of the present invention that particular sets of expressed genes 
("transcriptomes") are expressed only in cancer cells; expression of these genes can be 
used, inter alia, to identify a test cell as cancerous and to screen for anti-cancer drugs. 
These cancer-specific genes can also provide targets for therapeutic intervention. 

It is another discovery of the invention that other transcriptomes are differentially 
associated with distinct cell types; expression of genes of these transcriptomes can therefore 
be used to identify a test cell as belonging to one of these distinct cell types. 

It is yet another discovery of the invention that genes of another transcriptome are 
expressed ubiquitously; expression of genes of this transcriptome can be used to 
standardize expression of other genes in a variety of gene expression assays. 

To identify the transcriptomes described herein we used the SAGE method, as 
described in Velculescu et al (1) and Velculescu et al (12), to analyze gene expression in 
a variety of different human cell and tissue types. The SAGE method is also described in 
U.S. Patents 5,866,330 and 5,695,937. A total of 84 SAGE libraries were generated from 
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19 tissues (Table 1). Diseased tissues included cancers of the colon, pancreas, breast, lung, 
and brain, as well as melanoma, hemangiopericytoma, and polycystic kidney disease. 
Normal tissues included epithelia of the colon, breast, lung, and kidney, melanocytes, 
chondrocytes, monocytes, cardiomyocytes, keratinocytes, and cells of prostate and brain 
white matter and astrocytes. 

A total of 3,496,829 transcript tags were analyzed and found to represent 134,135 
unique transcripts after correcting for sequencing errors (transcript data available at 
http://www.sagenet.org./transcriptome). Expression levels for these transcripts ranged from 
0.3 to a high of 9,41 7 transcript copies per cell in lung epithelium. Comparison against the 
GenBank and UniGene collections of characterized genes and expressed sequence tags 
(ESTs) revealed that 6,900 transcript tags matched known genes, while 65,735 matched 
ESTs. The remaining 61,500 transcript tags (46%) had no matches to existing databases 
and corresponded to previously uncharacterized or partially sequenced transcripts. 

Each of the genes or transcripts whose expression can be measured in the methods 
of the invention comprises a unique sequence of at least 10 contiguous nucleotides (the 
"SAGE tag"). Genes which are differentially expressed in colon, lung, kidney, and breast 
epithelial cells, brain cells, prostate cells, keratinocytes, or melanocytes are shown in Table 
3. Ubiquitously expressed genes are shown in Table 4. Transcripts which are expressed 
only in cancer tissues, e.g., colon cancer, breast cancer, brain cancer, liver cancer, and 
melanoma, are shown in Table 5. 

This information provides heretofore unavailable picture of human transcriptomes. 
These results, like the human genome sequence, provide basic information integral to future 
experimentation in normal and disease states. Because SAGE analyses provide absolute 
expression levels, future SAGE data can be directly integrated with those described here 
to provide progressively deeper insights into gene expression patterns. Eventually, a 
relatively complete description of the transcripts expressed in diverse cell types and in 
various physiologic states can be obtained. 

Isolated polynucleotides 

The invention provides isolated polynucleotides comprising either 
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deoxyribonucleotides or ribonucleotides. Isolated DNA polynucleotides according to the 
invention contain less than a whole chromosome and can be either genomic DNA or DNA 
which lacks introns, such as cDNA. Isolated DNA polynucleotides can comprise a gene 
or a coding sequence of a gene comprising a sequence as shown in SEQ ID NOS:l-1563, 
such as polynucleotides which comprise a sequence selected from the group consisting of 
SEQ ID NOS:2, 5, 6, 8, 10, 12, 13, 15, 17, 18, 21, 24-26, 28, 30, 31, 34-36, 38, 40, 47-51, 
53-57, 59-62, 65-69, 71-76, 78, 80-84, 98, 103, 113, 115, 122, 129, 132, 134, 135, 140, 
144, 149, 150, 153-168, 174-176, 182, 185, 186, 188, 190, 200, 201, 205-213, 216-224, 
237, 239, 257, 263, 485, 487, 495, 499, 514, 586, 686, 751, 835, 844, 878, 910, 925, 932, 
951, 1000, 1005, 1070, 1 122, 1 130, 1 170, 1 173, 1 187, 1 189, 1200, 1213, 1220, 1237, 1257, 
1264, 1273, 1293, 1300, 1320, 1367, 1371, 1401, 1403, 1404, 1406, 1418, and 1419. 

Any technique for obtaining a polynucleotide can be used to obtain isolated 
polynucleotides of the invention. Preferably the polynucleotides are isolated free of other 
cellular components such as membrane components, proteins, and lipids. They can be 
made by a cell and isolated, or synthesized using an amplification technique, such as PCR, 
or by using an automatic synthesizer. Methods for purifying and isolating polynucleotides 
are routine and are known in the art. 

Isolated polynucleotides also include oligonucleotide probes, which comprise at least 
one of the sequences shown in SEQ ED NOS:l-1563. An oligonucleotide probe is 
preferably at least 10, 1 1, 12, 13, 14, 15, 20, 30, 40, or 50 or more nucleotides in length. 
If desired, a single oligonucleotide probe can comprise 2, 3, 4, or 5 or more of the 
sequences shown in SEQ ID NOS:l-1563. The probes may or may not be labeled. They 
may be used, for example, as primers for amplification reactions , such as PCR, in Southern 
or Northern blots, or for in situ hybridization. 

Oligonucleotide probes of the invention can be made by expressing cDNA molecules 
comprising one or more of the sequences shown in SEQ ID NOS: 1 -1 563 in an expression 
vector in an appropriate host cell. Alternatively, oligonucleotide probes can be synthesized 
chemically, for example using an automated oligonucleotide synthesizer, as is known in the 
art. 

11 
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SolidJiupparts Comprising Polynucleotides 

Polynucleotides, particularly oligonucleotide probes, preferably are immobilized on 
a solid support. A solid support can be any surface to which a polynucleotide can be 
attached. Suitable solid supports include, but are not limited to, glass or plastic slides, 
tissue culture plates, microtiter wells, tubes, gene "chips,"or particles such as beads, 
including but not limited to latex, polystyrene, or glass beads. Any method known in the 
art can be used to attach a polynucleotide to a solid support, including use of covalent and 
non-covalent linkages, passive absorption, or pairs of binding moieties attached 
respectively to the polynucleotide and the solid support. 

Polynucleotides are preferably present on an array so that multiple polynucleotides 
can be simultaneously tested for hybridization to polynucleotides present in a single 
biological sample. The polynucleotides can be spotted onto the array or synthesized in situ 
on the array. Such methods include older technologies, such as "dot blot" and "slot blot" 
hybridization (53, 54), as well as newer "microarray" technologies (55-58). A single array 
contains at least one polynucleotide, but can contain more than 1 00, 500, 1 ,000, 1 0,000, or 
100,000 or more different probes in discrete locations. 

DfiiPrmininQ expression of a gene product 

Each of the methods of the invention involves measuring expression of a gene 
product of at least one of the genes identified in Tables 3, 4, and 5 (SEQ ID NOS:l-1448). 
If desired, expression of gene products of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 50, 
75, 100, 125, 250, 500, 1,000, 1,250, or more genes can be determined. 

Either protein or RNA products of the disclosed genes can be determined. Either 
qualitative or quantitative methods can be used. The presence of protein products of the 
disclosed genes can be determined, for example, using a variety of techniques known to the 
art, including immunochemical methods such as radioimmunoassay, Western blotting, and 
immunohistochemistry. Alternatively, protein synthesis can be determined in v/vo, in a cell 
culture, or in an in vitro translation system by detecting incorporation of labeled amino 
acids into protein products. 

RNA expression can be determined, for example, using at least 1, 2, 3, 4, 5, 10, 15, 
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20, 25, 30, 50, 75, 100, 125, 250, 500, 1,000, 5,000, 10,000, or 100,000 or more 
oligonucleotide probes, either in solution or immobilized on a solid support, as described 
above. Expression of the disclosed genes is preferably determined using an array of 
oligonucleotide probes immobilized on a solid support. In situ hybridization can also be 
used to detect RNA expression. 

Identification of Cell Types 

Cell-type specific genes are expressed at a level greater than 10 copies per cell in a 
particular cell type, such as epithelial cells of the colon, breast, lung, and kidney, 
keratinocytes, melanocytes, and cells from the prostate and brain, but are not expressed in 
cells of other tissues. Such cell-type specific genes represent "cell-type specific 
transcriptomes." The fraction of cell-type-specific transcripts ranges from 0.05% in 
normal prostate to 1.76% in normal colon epithelium. Approximately 50% of these 
transcripts tags match known genes or ESTs. The vast majority of these cell-type-specific 
genes have not been previously reported in the literature to be cell-type specific. 

Cell type-specific genes are shown in Table 3. Genes which comprise the sequences 
shown in SEQ ID NOS.1-85 are uniquely expressed in colon epithelial cells. Genes which 
comprise the sequences shown in SEQ ID NOS:86-151 are uniquely expressed in brain 
cells. Genes which comprise the sequences shown in SEQ ID NOS: 152-155 are uniquely 
expressed in keratinocytes. Genes which comprise the sequences shown in SEQ ED 
NOS:156-160 are uniquely expressed in breast epithelial cells. Genes which comprises the 
sequences shown in SEQ ID NOS:161-167 are uniquely expressed in lung epithelial cells. 
Genes which comprises the sequences shown in SEQ ID NOS: 168-208 are uniquely 
expressed in melanocytes. Genes which comprise the sequences shown in SEQ ID 
NOS:209 and 210 are uniquely expressed in prostate cells. Genes which comprise the 
sequences shown in SEQ ID NOS :2 11-225 are uniquely expressed in kidney epithelial 
cells. Thus, determination of expression of at least one gene from each of these uniquely 
expressed groups, particularly those not previously known to be uniquely expressed, can 
be used to identify a test cell as an epithelial cell of the colon, breast, lung, and kidney, a 
keratinocyte, a melanocyte, or a cell from the prostate or brain. 
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Test cells can be obtained, for example, from biopsy or surgical samples, forensic 
samples, cell lines, or primary cell cultures. Test cells include normal as well as cancer 
cells, such as primary or metastatic cancer cells. 

To identify a test cell as an epithelial cell of the colon, breast, lung, and kidney, a 
keratinocyte, a melanocyte, or a cell from the prostate or brain, expression of a gene 
product of at least one gene is determined, using methods such as those described above. 
If a test cell expresses a gene comprising a sequence shown in SEQ ID NOS:2, 5-18, and 
20-85, the test cell is identified as a colon epithelial cell. If a test cell expresses a gene 
comprising a sequence shown in SEQ ED NOS:87-96, 98, 300-103, 105, 107-1 10, 1 12-129, 
and 131-151, the test cell is identified as a brain cell. If a test cell expresses a gene 
comprising a sequence shown in SEQ ID NOS:152-155, the test cell is identified as a 
keratinocyte. If a test cell expresses a gene comprising a sequence shown in SEQ ED 
NOSrl 56-160, the test cell is identified as a breast epithelial cell. If a test cell expresses 
a gene comprising a sequence shown in SEQ ED NOS:161-167, the test cell is identified as 
a lung epithelial cell. Expression of a gene comprising a sequence shown in SEQ ED 
NOS:168, 170, 172-177, 179-188, and 190-208 identifies the test cell as a melanocyte. 
Expression of a gene comprising a sequence shown in SEQ ED NOS:209 and 210 identifies 
the test cell as a prostate cell. Expression of a gene which comprises a sequence shown in 
SEQ ID NOS:21 1-225 identifies the test cell as a kidney epithelial cell. 



Identifying a Test Cell as a Cancer Cell 

A cancer-specific gene is expressed at a level of at least 3 copies per cancer cell, such 
as a colon cancer, breast cancer, brain cancer, lung cancer, or melanoma cell, at a level 
which is at least two-fold higher than expression of the same gene in a corresponding 
normal cell. Cancer-specific genes which comprise the sequences shown in SEQ ID 
NOS:226-265 (Table 5) represent a "cancer transcriptome." SEQ ID NOS:237, 239, 257, 
and 263 are sequences which are found in transcripts of novel cancer-specific genes of the 
invention. Oligonucleotide probes corresponding to cancer-specific genes can be used, for 
example, to detect and/or measure expression of cancer-specific genes for diagnostic 
purposes, to assess efficacy of various treatment regimens, and to screen for potential anti- 
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cancer drugs. 

For example, determination of the expression level of any of these genes in a test cell 
relative to the expression level of the same gene in a normal cell (a cell which is known not 
to be a cancer cell) can be used to determine whether the test cell is a cancer cell or a non- 
cancer cell. 

Test cells can be any human cell suspected of being a cancer cell, including but not 
limited to a colon epithelial cell, a breast epithelial cell, a lung epithelial cell, a kidney 
epithelial cell, a melanocyte, a prostate cell, and a brain cell. Test cells can be obtained, 
for example, from biopsy samples, surgically excised tissues, forensic samples, cell lines, 
or primary cell cultures. Comparison can be made to a non-cancer cell type, including to 
the corresponding non-cancer cell type, either at the time expression is measured in the test 
cell or by reference to a previously determined expression standard. 

To identify a test cell as a cancer cell, expression of a gene product of at least one 
gene is determined, using methods such as those described above. The at least one gene 
comprises a sequence selected from the group consisting of SEQ ED NOS:226-265, 
particularly from the group consisting of SEQ ED NOS:228, 230-236, 238, 240-256, 258- 
260, and 262-265. An increase in expression of the at least one gene in the test cell which 
is at least two-fold more than the expression of the at least one gene.in a cell which is not 
cancerous identifies the test cell as a cancer cell. 

Reducing Cancer-Specific Gene Expression 

Cancer-specific genes provide potential therapeutic targets for treating cancer or for 
use in model systems, for example, to screen for agents which will enhance the effect of 
a particular compound on a potential therapeutic target. Thus, a reagent can be 
administered to a human cell, either in vitro or in vivo, to reduce expression of a cancer- 
specific gene. The reagent specifically binds to an expression product of a gene comprising 
a sequence selected from the group consisting of SEQ ED NOS:226-265, particularly from 
the group consisting of SEQ IDNOS:228, 230-236, 238, 240-256, 258-260, and 262-265. 

If the expression product is a protein, the reagent is preferably an antibody. Protein 
products of cancer-specific genes can be used as immunogens to generate antibodies, such 
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as a polyclonal, monoclonal, or single-chain antibodies, as is known in the art. Protein 
products of cancer-specific genes can be isolated from primary or metastatic tumors, such 



as primaiy colon adenocarcinomas, lung cancers, astrocytomas, glioblastomas, breast 
cancers, and melanomas. Alternatively, protein products can be prepared from cancer cell 
lines such as SW480, HCT1 16, DLD1 , HT29, RKO, 21-PT, MDA-468, A549, and the like. 
If desired, cancer-specific gene coding sequences can be expressed in a host cell or in an 
in vitro translation system. An antibody which specifically binds to a protein product of 
a cancer-specific gene provides a detection signal at least 5-, 10-, or 2-fold higher than a 
detection signal provided with other proteins when used in an immunochemical assay. 
Preferably, the antibody does not detect other proteins in immunochemical assays and can 
immunoprecipitate the cancer-specific protein product from solution. 

For administration in vitro, an antibody can be added to a tissue culture preparation, 
either as a component of the medium or in addition to the medium. In another 
embodiment, antibodies are delivered to specific tissues in vivo using receptor-mediated 
targeted delivery. Receptor-mediated DNA delivery techniques are taught in, for example, 
Findeis et al Trends in Biotechnol 11, 202-05, (1993); Chiou et al y GENE THERAPEUTICS: 
Methods and Applications of Direct Gene Transfer (J.A. WolfT, ed.) (1994); Wu & 
Wu, 1 Biol. Chem. 263, 621-24, 1988; Wu et al,J. Biol Chem. 269, 542-46, 1994; Zenke 
et ai, Proc. Natl. Acad. ScL U.S.A. 87, 3655-59, 1990; Wu etalj. Biol Chem. 266, 338- 
42,1991. 

If single-chain antibodies are used, polynucleotides encoding the antibodies can be 
constructed and introduced into cells using well-established techniques including, but not 
limited to, transferrin-polycation-mediated DNA transfer, transfection with naked or 
encapsulated nucleic acids, liposome-mediated cellular fusion, intracellular transportation 
of DNA-coated latex beads, protoplast fusion, viral infection, electroporation, "gene gun," 
and DEAE- or calcium phosphate-mediated transfection. 

Effective in vivo dosages of an antibody are in the range of about 5 fig to about 50 
Hg/kg of patient body weight, about 50 \xg to about 5 mg/kg, about 100 \ig to about 500 
Hg/kg of patient body weight, and about 200 to about 250 Mg/kg. For administration of 
polynucleotides encoding single-chain antibodies, effective in vivo dosages are in the range 
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of about 100 ng to about 200 ng, 500 ng to about 50 mg, about 1 \xg to about 2 mg, about 
5 pg to about 500 ng, and about 20 ng to about 100 \ig of DNA. 

If the expression product is mRNA, the reagent is preferably an antisense 
oligonucleotide. The nucleotide sequence of an antisense oligonucleotide is 
complementary to at least a portion of the sequence of the cancer-specific gene. Preferably, 
the antisense oligonucleotide sequence is at least 1 0 nucleotides in length, but can be at 
least 1 1 , 12, 15, 20, 25, 30, 35, 40, 45, or 50 or more nucleotides long. Longer sequences 
also can be used. An antisense oligonucleotide which specifically binds to an mRNA 
product of a cancer-specific gene preferably hybridizes with no more than 3 or 2 
mismatches, preferably with no more than 1 mismatch, even more preferably with no 
mismatches. 

Antisense oligonucleotides can be deoxyribonucleotides, ribonucleotides, or a 
combination of both. Oligonucleotides, including modified oligonucleotides, can be 
prepared by methods well known in the art (47-52) and introduced into human cells using 
techniques such as those described above. The cells can be in a primary culture of human 
tumor cells, in a human tumor cell line, or can be primary or metastatic tumor cells present 
in a human body. 

Preferably, a reagent reduces expression of a cancer-specific gene by at least 10%, 
20%, 30%, 40%, 50%, 60%, 70%, or 80% relative to expression of the gene in the absence 
of the reagent. Most preferably, the level of gene expression is decreased by at least 90%, 
95%, 99%, or 100%. The effectiveness of the mechanism chosen to decrease the level of 
expression of a cancer-specific gene can be assessed using methods well known in the art, 
such as hybridization of nucleotide probes to cancer-specific gene mRNA, quantitative RT- 
PCR, or immunologic detection of a protein product of the cancer-specific gene. 

Screening for Anti-Cancer Drugs 

According to the invention, test compounds can be screened for potential use as anti- 
cancer drugs by assessing their ability to suppress or decrease the expression of at least one 
cancer-specific gene. The cancer-specific gene comprises a sequence selected from the 
group consisting of SEQ ID NOS:226-265, particularly from the group consisting of SEQ 
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ID NOS:228, 230-236, 238, 240-256, 258-260, and 262-265. Test compounds can be 
pharmacologic agents already known in the art or can be compounds previously unknown 
to have any pharmacological activity, including small molecules from compound libraries. 
Test substances can be naturally occurring or designed in the laboratory. They can be 
isolated from microorganisms, animals, or plants, or can be produced recombinantly or 
synthesized by chemical methods known in the art. 

To screen a test compound for use as a possible anti-cancer drug, a cancer cell is 
contacted with the test compound. The cancer cell can be a cell of a primary or metastatic 
tumor, such as a tumor of the colon, breast, lung, prostate, brain, or kidney, or a melanoma, 
which is isolated from a patient. Alternatively, a cancer cell line, such as colon cancer cell 
lines HCT1 16, DLD1 , HT29, Caco2, SW837, SW480, and RKO, breast cancer cell lines 
21-PT, 21-MT, MDA-468, SK-BR3, and BT-474, the A549 lung cancer cell line, and the 
H392 glioblastoma cell line, can be used. 

Expression of a gene product of at least one gene is determined using methods such 
as those described above. The gene comprises a sequence selected from the group 
consisting of SEQ ID NOS:226-265, preferably from the group consisting of SEQ ID 
NOS:228, 230-236, 238, 240-256, 258-260, and 262-265, even more preferably from the 
group consisting of SEQ ID NOS:237, 239, 257, and 263. A decrease in expression of the 
gene in the cancer cell identifies the test compound as a potential anti-cancer drug. 



Standardizing Expression of a Test Gene 

Genes which comprise the sequences shown in SEQ ID NOS:266-1448 (Table 4) are 
expressed at a level of at least five transcript copies per cell in every cell type analyzed, 
including epithelia of the colon, breast, lung, and kidney, melanocytes, chondrocytes, 
monocytes, cardiomyocytes, keratinocytes, prostate cells, and astrocytes, oligodendrocytes, 
and other cells present in the white matter of brain. These genes thus represent members 
of the "minimal transcriptome," the set of genes expressed in all human cells. The minimal 
transcriptome includes well known genes which are often used as experimental controls to 
normalize gene expression, such as glyceraldehyde 3-phosphate dehydrogenase, elongation 
factor 1 alpha, and gamma actin. 
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Ubiquitously expressed genes can be used to compare expression of a test gene in a 
test sample to expression of a gene in a standard sample. A ubiquitously expressed gene 
preferably comprises a sequence shown in SEQ ID NOS:266-375, 377-652, 654-796, and 
798-1448, and more preferably comprises a sequence shown in SEQ ID NOS.282, 288, 
300, 302, 308, 320, 323, 363, 368, 379, 381, 444, 453, 518, 531, 535, 538, 542, 579, 580, 
594, 600, 604, 617, 626, 641, 650, 717, 728, 776, 777, 794, 818, 822, 842, 885, 887, 899, 
900, 902, 904,914, 930,960, 964, 1001, 1015, 1020, 1027, 1035, 1090, 1113, 1119, 1146, 
1151, 1163, 1233, 1235, 1252, 1255, 1270, 1340, 1345, 1356, 1359, 1360, 1362, 1385, 
1415, and 1441. 

Two ratios are determined using gene expression assays such as those described 
above. The first ratio is an amount of an expression product of a test gene in a test sample 
to an amount of an expression product of at least one ubiquitously expressed gene 
comprising a sequence selected from the group consisting of SEQ ED NOS:266-375, 377- 
652, 798-1447, and 1448 in the test sample. The second ratio is an amount of an 
expression product of the test gene in a standard sample to an amount of an expression 
product of the ubiquitously expressed gene in the standard sample. Expression of either 
the test gene or the ubiquitously expressed gene can be used as the denominator. If desired, 
multiple ratios can be determined, such as (a) an amount of an expression product of more 
than one test gene to that of a single ubiquitously expressed gene, (b) an amount of an 
expression product of a single test gene to that of more than one ubiquitously expressed 
genes, or (c) an amount of an expression product of more than one test gene to that of more 
than one ubiquitously expressed gene. Optionally, the ratio in the standard sample can be 
pre-determined. 

The ratios determined in the test and standard samples are compared. A different 
between the ratios indicates a difference in the amount of the expression product of the test 
gene in the test sample. 

The standard and test samples can be matched samples, such as whole cell cultures 
or homogenates of cells (such as a biopsy sample) and differ only in that the test biological 
sample has been subjected to a different environmental condition, such as a test compound, 
a drug whose effect is known or unknown, or altered temperature or other environmental 
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condition. Alternatively, the test and standard samples can be corresponding cell types 
which differ according to developmental age. In one embodiment, the test sample is a 
cancer cell, such as a colon cancer, breast cancer, lung cancer, melanoma, or brain cancer 
cell, and the standard sample is a normal cell. 

The test gene can be a gene which encodes a protein whose biological function is 
known or unknown. Preferably the ratio of expression between the test gene and 
expression of the ubiquitously expressed gene is consistent in the standard sample. Even 
more preferably, expression of the ubiquitously expressed gene is not altered in the test 
sample. A difference between the first ratio of expression in the test sample and a second 
ratio of expression in the standard sample can therefore be used to indicate a difference in 
expression of the test gene in the test sample. 

Screening far Compounds for Increasing an Organ or Cell Function 

Test compounds can be screened for the ability to increase an organ or cell function 
by assessing their ability to increase expression of at least one tissue-specific gene. The 
tissue-specific gene comprises a sequence selected from at least one of the following 
groups: 

(a) the sequences shown in SEQ ID NOS:2, 5-18, 20-84, and 85; 

(b) the sequences shown in SEQBDNOS:87-96, 98, 100-103, 105, 107-110, 112-129, 
131-150, and 151; 

(c) the sequences shown in SEQ ID NOS:152-154, and 155; 

(d) the sequences shown in SEQ ED N0S:156-159 and 160; 

(e) the sequences shown in SEQ ID NOS:161-166 and 167; 

(f) the sequences shown in SEQ ID NOS:168, 170, 172-177, 179-188, 190-207, and 

208; 

(g) the sequences shown in SEQ ID NOS:209 and 210; and 

(h) the sequences shown in SEQ ID NOS:2 1 1 -224 and 225. 

As with the anti-cancer drug screening method described above, test compounds can be 
pharmacologic agents already known in the art or can be compounds previously unknown 
to have any pharmacological activity, including small molecules from compound libraries. 
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Test substances can be naturally occurring or designed in the laboratory. They can be 
isolated from microorganisms, animals, or plants, or can be produced recombinantly or 
synthesized by chemical methods known in the art. 

To screen a test compound for the ability to increase an organ or cell function, a cell, 
such as a colon epithelial cell, a brain cell, a keratinocyte, a breast epithelial cell, a lung 
epithelial cell, a melanocyte, a prostate cell, or a kidney cell, is contacted with the test 
compound. The cell can be a primary culture, such as an explant culture, of tissue obtained 
from a human, or can originate from an established cell line. 

Expression of a gene product of at least one gene is determined using methods such 
as those described above. An increase in expression of a gene product of at least one gene 
comprising a sequence selected from (a) identifies the test compound as a potential drug 
for increasing a function of a colon cell. An increase in expression of a gene product of at 
least one gene comprising a sequence selected from (b) identifies the test compound as a 
potential drug for increasing a function of a brain cell. An increase in expression of a gene 
product of at least one gene comprising a sequence selected from (c) identifies the test 
compound as a potential drug for increasing a function of a skin cell. An increase in 
expression of a gene product of at least one gene comprising a sequence selected from (d) 
identifies the test compound as a potential drug for increasing a function of a breast cell. 
An increase in expression of a gene product of at least one gene comprising a sequence 
selected from (e) identifies the test compound as a potential drug for increasing a function 
of a lung cell. An increase in expression of a gene product of at least one gene comprising 
a sequence selected from (f) identifies the test compound as a potential drug for increasing 
a function of a melanocyte. An increase in expression of a gene product of at least one 
gene comprising a sequence selected from (g) identifies the test compound as a potential 
drug for increasing a function of a prostate cell. An increase in expression of a gene 
product of at least one gene comprising a sequence selected from (h) identifies the test 
compound as a potential drug for increasing a function of a kidney cell. 

Restoring Function to a Dise ased Tissue or Cell 

Function can be restored to a diseased tissue or cell, such as a melanocyte or a colon, 
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brain, keratinocyte, breast, lung, prostate, or kidney cell, by delivering an appropriate 
tissue-specific gene to cells of that tissue. The tissue specific gene comprises a nucleotide 
sequence selected from at least one of the following groups: 

(a) the sequences shown in SEQ ID NOS:2, 5-18, 20-84, and 85 (colon-specific); 

(b) the sequences shown inSEQEDNOS:87-96,98, 100-103, 105, 107-110, 112-129, 
131-150, and 151 (brain-specific); 

(c) the sequences shown in SEQ ED NOS: 152-1 54, and 155 (keratinocyte-specific); 

(d) the sequences shown in SEQ ID NOS: 156-1 59 and 160 (breast-specific); 

(e) the sequences shown in SEQ ED NOS:161-166 and 167 (lung-specific); 

(0 the sequences shown in SEQIDNOS:168, 170, 172-177, 179-188, 190-207, and 
208 (melanocyte-specific); 

(g) the sequences shown in SEQ ID NOS:209 and 210 (prostate-specific); and 

(h) the sequences shown in SEQ ID NOS :21 1-224 and 225 (kidney-specific). 
Expression of the gene in a cell of the diseased tissue preferably is 10, 20, 30, 40, 50, 60, 
70, 80, or 90% less than expression of the gene in a cell of the corresponding tissue which 
is normal. In some cases, the diseased cell fails to express the gene. A tissue-specific gene 
which is administered to cells for this puipose includes a polynucleotide comprising a 
coding sequence which is intron-free, such as a cDNA, as well as a polynucleotide which 
comprises elements in addition to the coding sequence, such as regulatory elements. 

Coding sequences of many of the tissue-specific genes disclosed herein are publicly 
available. For the novel tissue-specific genes identified here, coding sequences can be 
obtained using a variety of methods, such as restriction-site PCR (Sarkar, PCR Methods 
Applic. 2:318-322, 1993), inverse PCR (Triglia et al. y Nucleic Acids Res. 76:8186, 1988), 
capture PCR (Lagerstrom, et a/., PCR Methods Applic. 7:111-119, 1991). Alternatively, 
the partial sequences disclosed herein can be nick-translated or end-labeled with 32 P using 
polynucleotide kinase using labeling methods known to those with skill in the art (BASIC 
Methods in Molecular Biology, Davis et al. y eds., Elsevier Press, N.Y., 1986). A 
lambda library prepared from the appropriate human tissue can then be directly screened 
with the labelled sequences of interest. 

Many methods for introducing polynucleotides into cells or tissues are available and 
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can be used to deliver a tissue-specific gene to a cell in vitro or in vivo. Introduction of the 
tissue-specific gene into a cell can be accomplished by any method by which a nucleic acid 
molecule can be inserted into a cell, such as transfection, electroporation, microinjection, 
lipofection, adsorption, and protoplast fusion. For in vitro administration, a tissue-specific 
gene can be added to a tissue culture preparation, either as a component of the medium or 
in addition to the medium. In vivo administration can be by means of direct injection of 
a vector comprising a tissue-specific gene to the particular tissue or cells to which the 
tissue-specific gene is to be delivered. Alternatively, the tissue-specific gene can be 
included in a vector which is capable of targeting a particular tissue and administered 

systemically (59-61). 

For in vitro administration, suitable concentrations of a tissue-specific gene in the 
culture medium range from at least about 10 pg to 100 pg/ml, about 100 pg to about 500 
pg/ml, about 500 pg to about 1 ng/ml, about 1 ng to about 10 ng/ml, about 10 ng to about 
100 ng/ml, or about 100 ng/ml to about 500 ng/ml. For local administration, effective 
dosages of a tissue-specific gene range from at least about 10 ng to about 100 ng, about 50 
ngto 150 ng, about 1 00 ng to about 250 ng, about 1 ng to about 10 pg, about 5 pg to about 
50 pg, about 25 pg to about 100 pg, about 75 pg to about 250 pg, about 100 pg to about 
250 pg, about 200 pg to about 500 pg, about 500 pg to about 1 mg, about 1 mg to about 
10 mg, about 5 mg to about 50 mg, about 25 mg to about 100 mg, or about 50 mg to about 
200 mg ofDNA per injection. Suitable concentrations for systemic administration range 
from at least about 500 ng to about 50 mg, about 1 pg to about 2 mg, about 5 pg to about 
500 pg, and about 20 pg to about 100 pg of DNA per kg of body weight. 

Recombinant DNA technologies can be used to improve expression of the tissue- 
specific gene by manipulating, for example, the number of copies of the gene in the cell, 
the efficiency with which the gene is transcribed, the efficiency with which the resultant 
transcripts are translated, and the efficiency of post-translational modifications. 
Recombinant techniques useful for increasing the expression of a tissue-specific gene in 
a cell include, but are not limited to, providing the tissue-specific gene in a high-copy 
number plasmid, integrating the tissue-specific gene into one or more host cell 
chromosomes, adding vector stability sequences to plasmids, substituting or modifying 
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transcription control signals (e.g., promoters, operators, enhancers), substituting or 
modulating translation^ control signals {eg., ribosome binding sites, Shine-Dalgarno 
sequences), and deleting sequences that destabilize transcripts. (See Dow et al, U.S. Patent 
5,935,568). 

; Preferably, delivery of the tissue-specific gene increases expression of a gene product 

of the tissue-specific gene in the cell or tissue by at least 10, 20, 30, 40, 50, 60 70, 80, 90, 
95, 98, 99, or 100% relative to expression of the tissue-specific gene in a diseased cell or 
tissue to which the gene has not been delivered. Expression of a protein product of the 
tissue-specific gene can be determined immunologically, using methods such as 
radioimmunoassay, Western blotting, and immunohistochemistry. Alternatively, 
incorporation of labeled amino acids into a protein product can be determined. RNA 
expression is preferably determined using one or more oligonucleotide probes, either in 
solution or immobilized on a solid support, as described above. 

All documents cited in this disclosure are expressly incorporated herein. The above 
disclosure generally describes the present invention, and all references cited in this 
disclosure are incorporated by reference herein. A more complete understanding can be 
obtained by reference to the following specific examples which are provided for purposes 
of illustration only and are not intended to limit the scope of the invention. 



EXAMELEJ 

Tissue samples and the SAGE method 

RNA for normal tissues was obtained from the following sources: colon epithelial 
cells isolated from sections of normal colon mucosa from two patients (41); HaCaT 
keratinocyte cells (42), normal mammary epithelial cells from two individuals (Clonetics); 
normal bronchial epithelial cell from two individuals (43); normal melanocytes from two 
individuals (Cascade Biologies); normal cultured monocytes, dendritic cells and TNF 
activated dendritic cells; two normal kidney epithelial cell lines; cultured chondrocyte cells 
from two normal individuals and one patient with osteoarthritic disease; normal fetal 
cardiomyocytes in normoxic and hypoxic conditions; and normal brain white matter from 
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two patients and normal cultured astrocyte cells. 

RNA for diseased tissues was obtained from the following sources: primary colon 
adenocarcinomas from two patients, HCT1 16, DLD1, HT29, Caco2, SW837, SW480, and 
RKO colon cancer cell lines cultured in vitro in a variety of different cellular conditions 
including log phase growth, G1/G2 phase growth arrest, and apoptosis (40, 41, 44, 45); 
primary pancreatic adenocarcinomas from two patients and ASPC-1 and PL-45 pancreatic 
cancer cell lines (41); breast cancer cell lines 21-PT, 21-MT, MDA-468, SK-BR3, and 
BT-474; primary lung squamous cell cancers from two patients (43), primary lung 
adenocarcinoma from one patient, and the A549 lung cancer cell line (43); primary 
melanomas from 3 patients; kidney epithelial cells lines from two patients with polycystic 
kidney disease; hemangiopericytomas from 5 patients; primary glioblastoma tumors from 
two patients; and the H392 glioblastoma cell line. 

Isolation of polyadenylate RNA and the SAGE method for all tissues was performed 
as previously described (1, 12; see also U.S. Patents 5,866,330 and 5,695,937). 
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EXAMPLE 2 

Data analysis 

The SAGE software (12) was used to analyze raw sequence data and to identify a 
total of 3,668,175 SAGE tags. Of these, 171,346 tags (4.7%) corresponded to linker 
sequences and were removed from further analysis. The remaining 3,496,829 tags were 
derived from transcript sequences, but a small fraction of these contained sequencing 
errors. SAGE analysis of yeast (1), for which the entire genome sequence is known, 
demonstrated a sequencing error rate of -0.7% per bp, translating to a tag error rate of 6.8% 
(1-0.993; 10), in accord with sequence errors measured in the current data set. 

To provide as accurate an estimate of unique genes as possible, we accounted for 
sequencing errors in two ways. First, we only considered tags that occurred twice in the 
data set. Although this requirement might have removed legitimate transcript tags 
expressed at very low levels (less than approximately 0.2 copies per cell, or 2 copies in 
3,496,829 transcript tags), it eliminated the majority of sequencing errors (172,276 tags). 

Second, because of the size of the data set utilized, it was possible that the same 
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sequencing error in a given tag may be observed multiple times. To account for these, tags 
with expression levels high enough to give multiple redundant errors were analyzed for 
single base substitutions, insertions, and deletions. If the observed expression level of a tag 
did not exceed its expected incidence due to redundant errors by a factor of five, it was 
assumed to be the result of a repeated sequencing error. This identified and removed an 
additional 27,051 unique tags (1 56,174 total tags), a number very similar to estimates of 
multiple sequencing errors obtained by Monte Carlo simulations. 

In total, these corrections amount to a sequencing error rate of approximately 9.4%, 
suggesting that our analyses more than fully accounted for sequencing errors and that the 
remaining 134,135 unique transcript tags represented a conservative accounting of 
legitimate transcripts. 

Transcript tags were matched to known genes and ESTs by use of tables containing 
matching 10 bp transcript sequences, UniGene clusters, GenBank accession numbers, and 
functional descriptions downloaded from the SAGEmap web site 
(http://www.ncbi.nlm.nih.gov/SAGE) (Lai et a/., in press) on Feb 23, 1999 (UniGene build 
70, http://www.ncbi.nlm.nih.gov/UniGene), and the Microsoft Access software. As 
UniGene clusters numbers may change over time, the most recent tag to cluster mapping 
can be obtained for each transcript tag individually at 
http://www.ncbi.nlm.nih.gov/SAGE/SAGEtag.cgi, or for the entire data set at 
http.7/www.sagenet.org./transcriptome. A total of 37,534 distinct transcripts from the 
UniGene database contained polyadenylation signals or polyadenylated tails and matched 
the collection of SAGE transcript tags; these corresponded to 23,534 unique UniGene 
clusters. 

Transcript abundance per cell was determined simply by dividing the observed 
number of tags for a given transcript by the total number of transcripts obtained. An 
estimate of about 300,000 transcripts per cell was used to convert the abundances to copies 
per cell (46). For tissue specific transcripts, only transcript tags expressed at nominally * 10 
transcript copies per cell were considered in order to normalize for tissues with fewer total 
tags analyzed. 

The following transcript data from this analysis are available electronically at the 
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SAGEnet web site (http://www.sagenet.org/transcriptome) with the corresponding 
expression levels and UniGene descriptions: 134,135 unique transcript tags identified from 
3.5 million total transcripts tags; 69,381 transcript tags identified from colon cancer cells; 
217 transcripts that are exclusively expressed in colon epithelium, keratinocytes, breast 
epithelium, lung epithelium, melanocytes, kidney epithelium and cells from prostate and 
brain; 987 transcripts that were expressed in all tissues. Individual transcript libraries from 
a total of -800,000 transcript tags from colon epithelium, normal brain, colon cancer, and 
brain cancer are available at the SAGEmap web site (http7/www.ncbi.nlm.nih.gov/SAGE) 
(Lai et ai, in press). 



EXAMPLE 3 

Estimation of the number of genes present in the human genome 

The transcripts detected by SAGE provides an estimate of the number of genes 
present in the human genome. Historically, estimates of the number of unique genes in the 
genome have ranged from 60,000 to over 100,000 genes using analyses of EST clustering 
(15), frequency of genes in characterized genomic regions, frequency of CpG islands (16), 
and RNA-cDNA reassociation kinetics (4). If one were to assume that each unique 
transcript tag observed by SAGE corresponded to a unique gene, our data would indicate 
that there are approximately 134,000 genes in the human genome. 

However, such an approach is likely to overestimate the number of unique genes in 
the genome, as distinct transcripts can be derived from a single gene. Multiple sites for 
polyadenylation (17), alternative splicing, premature transcriptional termination (18), as 
well as polymorphisms in the SAGE tag or nearby restriction endonuclease site could lead 
to multiple transcript tags for any one gene. An analysis of all publicly available 3' 
end-derived ESTs revealed that this was the case for many transcripts, and provided an 
estimate of the multiplicity of transcripts expected for individual genes. 37,534 distinct 3' 
transcripts containing polyadenylation signals or polyadenylated tails were observed to 
correspond to 23,534 unique UniGene clusters, an average 1.6 different transcripts per 
gene. Applying a similar calculation to our SAGE data would suggest that the 134,135 
transcripts observed corresponded to 84,103 unique genes. As our SAGE data is by no 
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means a complete analysis of transcripts from all possible tissues, this estimate would 
provide a lower boundary for the number of unique genes in the genome. This figure is 
significantly higher than the 65,538 genes estimated from a clustering of 982,808 ESTs 
(UniGene Build 70) (15), and suggests that a substantial number of genes expressed at low 
levels may not be present in current EST databases. 

EXA M PLE 4 

Assessment of transcriptome complexity 

Assessment of transcriptome complexity requires a relatively complete sampling of 
a transcriptome for the cell type under analysis. Human cells are thought to contain close 
to 300,000 mRNA molecules, and therefore an analysis of at least several hundred thousand 
transcripts would be needed. Approximately 350,000 and 300,000 transcripts were 
analyzed from DLD1 and HCT1 16 colorectal cancer cells, respectively. As these cancer 
cells are diploid, have similar genetic and phenotypic properties, and have very similar 
gene expression patterns (see below), transcript tags obtained from these cells were 
analyzed in combination as well as individually. 

Analysis of either cell line afforded approximately a one fold coverage of the 300,000 
mRNA molecules in a cell, while the combined set represented a two fold coverage even 
for mRNA molecules present at a single copy per cell. Measurement of ascertained new 
tags at increasing increments of tags indicated that the fraction of new transcripts from 
analysis of additional tags approached 0 at approximately 650,000 tags in the combined set 
(FIG. 1). This suggested that generation of further SAGE tags would yield few additional 
genes, and Monte Carlo simulations indicated that analysis of 643,283 tags would identify 
at least one tag for a given transcript 96% of the time if its expression level was at least two 
transcript copies per cell, and 83% of the time if its expression level was at least one 
transcript copy per cell. 

The combined 643,283 transcript tags represented 69,381 unique transcripts, of which 
44,174 corresponded to known genes or ESTs in the GenBank or UniGene databases while 
25,207 represented previously undescribed transcripts (Table 2). Even when accounting 
for multiple unique transcripts per gene, these transcripts would represent at least 43,502 
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unique genes. This is substantially higher than the previous estimate of 15,000-25,000 
expressed genes obtained by RNA-DNA reassociation kinetics in a variety of human cell 
types (4), and suggests that a significant fraction of the genome may be expressed in 
individual cell types. As the kinetics of reassociation of a particular class of RNA and 
cDNA may be affected by a number of experimental variables and may underestimate 
transcripts of low abundance (4), it is not surprising that our studies have detected a higher 
number of expressed genes than estimated by hybridization analysis in both human cells 
(Table 2) and yeast. 

EXAMPLE 5 

Expression levels of transcripts in colon cancer cells 

Expression levels of transcripts in the colon cancer cell ranged from 0.5 to 2341 
copies per cell. The 61 transcripts expressed at over 500 transcript copies per cell made up 
nearly 1/4 of the mRNA mass of the cell and the most highly expressed 623 genes 
accounted for 54 of the mRNA content. In contrast, the vast majority of unique transcripts 
were expressed at low levels, with just under 23% of the mRNA mass of the cell 
comprising 90% of the unique transcripts expressed (Table 2). A "virtual rot" analysis of 
the expressed transcripts identified a relatively continuous distribution of gene expression 
without markedly discrete abundance classes, similar to those observed in previous rot 
studies of human cancer cells (20) (FIG. 2). 

The identities of the expressed genes reveal the diversity of expression of a human 
transcriptome (data available at http://www.sagenet.org./transcriptome). For example, 
highly expressed genes often encoded proteins important in protein synthesis, energy 
metabolism, cellular structure and certain tissue specific functions. Moderate and low 
abundance genes accounted for a multitude of cellular processes including protein 
modification enzymes, DNA replication machinery, cell surface receptors, components of 
signal transduction pathways and transcription factors as well as many other transcripts 
with currently unknown functions. 
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EXAMPLE 6 

Differences in gene expression between different tissues 

Differences in gene expression between different tissues may provide insights into 
the specialized processes underlying human physiology in normal and diseased states. In 
line with previous observations, overall gene expression patterns among the 19 different 
tissues analyzed were similar (examples in FIGS. 3A-3C). Changes in gene expression 
between physiologic states of a particular cell type or between patient samples of the same 
tissue were less than changes between cell types of different origins (FIGS. 3A-3C). 
Likewise, only a small fraction of transcripts was exclusively expressed in a particular 
normal or disease tissue. Detailed analyses of transcripts from epithelia of colon, breast, 
lung, and kidney, melanocytes, and cells from prostate and brain, identified transcripts that 
were nominally expressed at greater than 10 copies per cell in one tissue but not in any 
other tissue studied. The fraction of these tissue-specific transcripts ranged from 0.05% in 
normal prostate to 1.76% in normal colon epithelium (Table 3). Approximately 50% of 
these transcript tags matched known genes or ESTs (examples in Table 3 and data available 
at http://www.sagenet.org/transcriptome). Some of these transcripts identified genes 
already reported to be important for tissue specific processes. For example, brain specific 
transcripts such as GABA receptor, myelin basic protein, and synaptopodin are known to 
be important for synaptic transmission (21) formation and maintenance of the myelin 
sheath (22) and dendrite shape and motility (23), respectively. Likewise, 
guanylin/uroguanylin (24), carbonic anhydrase 1 (25), and CDX2 (26) are known to be 
expressed in colonic epithelium. 5,6-dihydroxyindole-2-carboxylic acid oxidase has been 
shown to have an important role for normal melanocyte pigment synthesis (27), while 
expression of MART-1 and melastatin may have clinical implications for melanoma 
patients (28, 29). However, the vast majority of the tissue specific transcripts observed 
have not been previously reported in the literature and their roles in the tissues examined 
remain to be elucidated. 
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EXAMPLE 7 

Minimal transcriptome 

Nearly 1000 transcripts were detected that were expressed at 5 transcript copies per 
cell in every cell type analyzed. These expressed genes represent a view into the "minimal 
transcriptome," the set of genes expressed in all human cells. Such genes, listed in order 
of their uniformity of expression in Table 4 (and available at 
http://www.sagenet.org./transcriptome), largely represent well known constitutive or 
housekeeping genes thought to provide the molecular machinery necessary for basic 
functions of cellular life (4). Genes involved in DNA, RNA, protein, lipid and 
oligosaccharide biosynthesis as well as in energy metabolism were among those observed. 
Additionally, genes from other functional classes including structural proteins (e.g. 
dystroglycan and myosin light chain), signaling molecules (e.g. 14-3-3 proteins and 
MAPKK2), proteins with compartmentalized functions (e.g. lysosome-associated 
membrane glycoprotein and ER lumen retaining protein receptor 1), cell surface receptors 
(e.g. FGF receptor and STRL22 G protein coupled receptor), proteins involved in 
intracellular transport (e.g. syntaxin and alpha SNAP), membrane transporters (e.g. Na7K + 
ATPase and mitochondrial F1/F0 ATPase), and enzymes involved in post-translational 
modification and protein degradation (e.g. kinases, phosphatases and proteasome 
components) were observed and were not previously known to be ubiquitously expressed. 
Well known genes often used as experimental controls such glyceraldehyde 3-phosphate 
dehydrogenase, elongation factor 1 alpha, and gamma actin were observed but varied in 
expression as much as 6 fold among different cell types. 

EXAMPLE 8 

Genes involved in tumorigenesis 

Genes that are uniformly expressed in cancers but expressed at lower levels in normal 
tissues may turn out to be important for tumorigenesis, and demonstrate how gene 
expression patterns might be useful in the analysis of disease states. We detected 40 genes 
that were expressed in all cancer tissues examined at levels 3 transcript copies per cell and 
whose expression was at least 2-fold higher in each cancer compared to its corresponding 



31 



WO 01/38577 



PCT/USOO/31922 



normal tissue (Table 5). Four of these transcripts had no matches to known genes and 15 
matched ESTs with no known function. Several of the highly induced transcripts provided 
tantalizing clues about their roles in tumorigenesis. For example, S 100A4 has been thought 
to play a role in late stage tumorigenesis as it is overexpressed in colorectal 
adenocarcinomas but not adenomas (30), and its induction can promote (while its inhibition 
can prevent) metastasis in tumor models. Midkine, a heparin-binding growth factor has 
been reported to be overexpressed in certain cancers (34), to transform cells in vitro (35), 
and to promote tumor angiogenesis in vivo. Finally, overexpression of survivin, an IAP 
apoptosis inhibitor (37) has been recently shown to predict shorter survival rates in 
colorectal cancer patients and may carry out its antiapoptotic functions as a mitotic spindle 
checkpoint factor (39). The observed elevated expression of such genes in many tumor 
types indicates a potentially general role for these genes in tumorigenesis and suggests they 
may be useful as diagnostic markers or targets for therapeutic intervention. 

EXAMPLE 9 

Estimate of gene number 

The 134,135 distinct transcripts identified in this study, corresponding to 
approximately 84,103 unique genes, provided an estimate of gene number substantially 
higher than the recent estimate (~ 65,000 genes) derived from extant EST clusters. What 
could account for the difference between these estimates, considering that both are derived 
from sequencing of transcripts from similar cell types? One explanation is that the 
clustering estimate is based on the number of observed EST clusters (62,236) divided by 
a measure of the completeness of the EST database. The latter value is calculated as the 
fraction of "characterized" genes in GenBank that already have EST matches (-95%). The 
characterized genes in GenBank have been assumed to be representative of the rest of the 
genes in the human genome, but our SAGE data indicated that their average expression was 
more than 10 fold higher than the mean levels of gene expression. Similarly, the number 
of ESTs that were present in clusters with characterized genes was approximately 12 fold 
higher than clusters composed entirely of ESTs. Such highly expressed genes would be 
more likely to be represented in transcript databases, thereby leading to an overestimation 
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of the completeness of the EST databases, and an underestimation of the number of unique 
genes. Indeed, the number of UniGene clusters continues to grow as a greater diversity of 
tissues is analyzed through the Cancer Genome Anatomy Project, and as of the date of 
submission of this manuscript already exceeds the recent EST derived estimate (71,849 
gene clusters in Build 80 versus 65,538 predicted from Build 70). 

Like other genome-wide analyses, studies of human transcriptomes using SAGE have 
several potential limitations. First, a small number of transcripts would be expected to lack 
the restriction enzyme site required to produce the 14 bp tags, and would therefore not be 
detected by our analyses (12). Second, our study was limited to the 19 tissues analyzed. 
Genes uniquely expressed in other tissues would not have been detected, and accordingly, 
genes observed to be tissue specific in our studies may turn out to be expressed in other 
normal or disease states. Finally, identification of genes corresponding to specific tags is 
mainly based on large but incomplete databases of ESTs and characterized genes. SAGE 
tags without matches to existing databases can directly be used to identify previously 
uncharacterized genes (1, 12, 40), but additional 3' EST data, as well as that of genomic 
regions would make gene identification more rapid. 
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CLAIMS 

1. A method of identifying a cell as either a colon epithelial cell, a brain cell, a 
keratinocyte, a breast epithelial cell, a lung epithelial cell, a melanocyte, a prostate cell, or 
a kidney epithelial cell, comprising the step of: 

determining expression in a test cell of a gene product of at least one gene 
comprising a sequence selected from at least one of the following groups: 

(a) the sequences shown in SEQ ID NOS:2, 5-18, 20-84, and 85; 

(b) the sequences shown in SEQ ID NOS:87-96, 98, 100-103, 105, 107-1 10, 
112-129, and 131-150, and 151; 

(c) the sequences shown in SEQ ED NOS: 152-1 54, and 155; 

(d) the sequences shown in SEQ ID NOS:156-159, and 160; 

(e) the sequences shown in SEQ ID NOS:161-166, and 167; 

(f) the sequences shown in SEQ ID NOS:168, 170, 172-177, 179-188, 190- 

207, and 208; 

(g) the sequences shown in SEQ ID NOS:209 and 210; and 

(h) the sequences shown in SEQ ID NOS:21 1-224 and 225, 

wherein expression of a gene product of at least one gene comprising a sequence 
shown in (a) identifies the test cell as a colon epithelial cell; 

wherein expression of a gene product of at least one gene comprising a sequence 
shown in (b) identifies the test cell as a brain cell; 

wherein expression of a gene product of at least one gene comprising a sequence 
shown in (c) identifies the test cell as a keratinocyte; 

wherein expression of a gene product of at least one gene comprising a sequence 
shown in (d) identifies the test cell as a breast epithelial cell; 

wherein expression of a gene product of at least one gene comprising a sequence 
shown in (e) identifies the test cell as a lung epithelial cell; 

wherein expression of a gene product of at least one gene comprising a sequence 
shown in (f) identifies the test cell as a melanocyte; 

wherein expression of a gene product of at least one gene comprising a sequence 
shown in (g) identifies the test cell as a prostate cell; and 
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wherein expression of a gene product of at least one gene comprising a sequence 
shown in (h) identifies the test cell as a kidney epithelial cell. 

2. The method of claim 1 wherein expression of gene products of at least two 
of said genes is determined. 

3. The method of claim 1 wherein expression of gene products of at least five 
of said genes is determined. 

4. The method of claim 1 wherein the gene product is protein. 

5. The method of claim 1 wherein the gene product is RNA. 

6. The method of claim 5 wherein expression is determined using at least one 
oligonucleotide probe. 

7. The method of claim 5 wherein expression is determined using at least two 
oligonucleotide probes. 

8. The method of claim 6 wherein the at least one oligonucleotide probe is 
immobilized on a solid support. 

9. The method of claim 8 wherein the at least one oligonucleotide probe is in an 

array. 

10. The method of claim 1 wherein the cell to be identified is a cancer cell. 

11. An isolated polynucleotide comprising a sequence selected from the group 
consisting of SEQ ID NOS:2, 5, 6, 8, 10, 12, 13, 15, 17, 18, 21, 24-26, 28, 30, 31, 34-36, 
38,40,47-51,53-57,59-62,65-69,71-76, 78, 80-84,98, 103, 113, 115, 122, 129, 132, 134, 
135, 140, 144, 149, 150, 153-168, 174-176, 182, 185, 186, 188, 190, 200, 201, 205-213, 
216-224, 237, 239, 257, 263, 485, 487, 495, 499, 514, 586, 686, 751, 835, 844, 878, 910, 
925,932,951, 1000, 1005, 1070, 1122, 1130, 1170, 1173, 1187, 1189, 1200, 1213, 1220, 
1237, 1257, 1264, 1273, 1293, 1300, 1320, 1367, 1371, 1401, 1403, 1404, 1406, 1418, and 
1419. 

12. A solid support comprising at least one polynucleotide comprising a sequence 
selected from at least one of the following groups: 

(a) the sequences shown in SEQ ID NOS:2, 5, 6, 8, 10, 12, 13, 15, 17, 18, 21, 
24-26, 28, 30, 31, 34-36, 38, 40, 47-51, 53-57, 59-62, 65-69, 71-76, 78, 80-83, and 84; 

(b) the sequences shown in SEQ ID NOS:98, 103, 1 13, 1 15, 122, 129, 132,. 
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134,135,140,144, 149, and 150; 

(c) the sequences shown in SEQ ID NOS: 153-1 54 and 155; 

(d) the sequences shown in SEQ IDNOS:156-157 and 160; 

(e) the sequences shown in SEQ ID NOS:161-166 and 167; 

(0 the sequences shown in SEQ ID NOS: 168, 174-176, 182, 185, 186, 188, 
190, 200, 201, 205-207 and 208; 

(g) the sequences shown in SEQ ID NOS:209 and 210; 

(h) the sequences shown in SEQ ID NOS:21 1-213, 216-223, and 224; 

(i) the sequences shown in SEQ ID NOS:237, 239, 257, and 263; or 

(j) the sequences shown in SEQ ID NOS:485, 487, 495, 499, 514, 586, 686, 
751,835, 844, 878,910, 925,932, 951, 1000, 1005, 1070, 1122, 1130, 1170, 1173, 1187, 
1189, 1200, 1213, 1220, 1237, 1257, 1264, 1273, 1293, 1300, 1320, 1367, 1371, 1401, 
1403, 1404, 1406, 1418, and 1419. 

13. The solid support of claim 12 wherein: 

if the at least one polynucleotide comprises a sequence selected from (a), then 
the solid support further comprises a polynucleotide comprising a sequence selected from 
the group consisting of the sequences shown in SEQ ID NOS:l, 3, 4, 7, 9, 1 1, 14, 16, 19, 
20, 22, 23, 27, 29, 32, 33, 37, 39, 41-46, 52, 58, 63, 64, 70, 77, 79, and 85; 

if the at least one polynucleotide comprises a sequence selected from (b), then 
the solid support further comprises a polynucleotide comprising a sequence selected from 
the group consisting of the sequences shown in SEQ ID NOS:86-97, 99-102, 104-1 12, 1 14, 
116-121, 123-128, 130, 131, 133, 136-139, 141-143, 145-148, and 151; 

if the at least one polynucleotide comprises a sequence selected from (c), then 
the solid support further comprises a polynucleotide comprising the sequence shown in 
SEQ ID NO: 152; 

if the at least one polynucleotide comprises a sequence selected from (f), then 
the solid support fiirther comprises a polynucleotide comprising a sequence selected from 
the group consisting of the sequences shown in SEQ ID NOS:169-173, 177-181, 183, 184, 
187, 189, 191-199, 202, 203, and 204; 

if the at least one polynucleotide comprises a sequence selected from (h), then 
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the solid support further comprises a polynucleotide comprising a sequence selected from 
the group consisting of the sequences shown in SEQ ID NOS:214, 215, and 225; 

if the at least one polynucleotide comprises a sequence selected from (i), then 
the solid support further comprises a polynucleotide comprising a sequence selected from 
the group consisting of the sequences shown in SEQ ID NOS:226-236, 238, 240-256, 258- 
262, 264, and 265; and 

if the at least one polynucleotide comprises a sequence selected from (j), then 
the solid support further comprises a polynucleotide comprising a sequence selected from 
the group consisting of the sequences shown in SEQ ID NOS:266-484, 486, 488-494, 496- 
498, 500-513, 515-585, 587-685, 687-750, 752-834, 836-843, 845-877, 879-909, 91 1-924, 
926-931,933-950, 952-999, 1001-1004, 1006-1069, 1071-1121, 1123-1129, 1131-1169, 
1171, 1171, 1174-1186, 1188, 1190-1199, 1201-1212, 1214-1219, 1221-1236, 1238-1256, 
1258-1263, 1265-1272, 1274-1292, 1294-1299, 1301-1319, 1321-1366, 1368-1370, 1372- 
1400, 1402, 1405, 1407-1416, and 1417. 

14. The solid support of claim 12, wherein: 

if the at least one polynucleotide comprises a sequence selected from (a), then 
the at least one polynucleotide further comprises a sequence selected from the group 
consisting of the sequences shown in SEQ ID NOS:l, 3, 4, 7, 9, 1 1, 14, 16, 19, 20, 22, 23, 
27, 29, 32, 33, 37, 39, 41-46, 52, 58, 63, 64, 70, 77, 79, and 85; 

if the at least one polynucleotide comprises a sequence selected from (b), then 
the at least one polynucleotide further comprises a sequence selected from the group 
consisting of the sequences shown in SEQ ID NOS:86-97, 99-102, 104-1 12, 114, 1 16-121, 
123-128, 130, 131, 133, 136-139, 141-143, 145-148, and 151; 

if the at least one polynucleotide comprising a sequence selected from (c), 
then the at least one polynucleotide further comprises SEQ ID NO:152; 

if the at least one polynucleotide comprises a sequence selected from (f), then 
the at least one polynucleotide further comprises a sequence selected from the group 
consisting of the sequences shown in SEQ ID NOS: 169-1 73, 177-181, 183, 184, 187, 189, 
191-199, 202, 203, and 204; 

if the at least one polynucleotide comprises a sequence selected from (h), then 
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the at least one polynucleotide further comprises a sequence selected from the group 
consisting of the sequences shown in SEQ ID NOS:214, 215, and 225; 

if the at least one polynucleotide comprises a sequence selected from (i), then 
the at least one polynucleotide further comprises a sequence selected from the group 
consisting of the sequences shown in SEQ ID NOS:226-236, 238, 240-256, 258-262, 264, 
and 265; and 

if the at least one polynucleotide comprises a sequence selected from (j), then 
the at least one polynucleotide farther comprises a sequence selected from the group 
consisting of the sequences shown in SEQ ID NOS:266-484, 486, 488-494, 496-498, 500- 
513, 515-585, 587-685, 687-750, 752-834, 836-843, 845-877, 879-909, 911-924, 926-931, 
933-950,952-999, 1001-1004, 1006-1069, 1071-1121, 1123-1129, 1131-1169, 1171, 1171, 
1174-1186, 1188, 1190-1199, 1201-1212, 1214-1219, 1221-1236, 1238-1256, 1258-1263, 
1265-1272, 1274-1292, 1294-1299, 1301-1319, 1321-1366, 1368-1370, 1372-1400, 1402, 
1405, 1407-1416, and 1417. 

1 5 . The solid support of claim 1 2 wherein the at least one polynucleotide is in an 

array. 

1 6. A method of identifying a test cell as a cancer cell, comprising the step of: 
determining expression in a test cell of a gene product of at least one gene 

comprising a sequence selected from the group consisting of SEQ ID NOS:228, 230-257, 
259-260, and 262-265, wherein an increase in said expression of at least two- fold relative 
to expression of the at least one gene in a normal cell identifies the test cell as a cancer cell. 

17. The method of claim 16 wherein expression of gene products of at least two 
of said genes is determined. 

1 8. The method of claim 16 wherein expression of gene products of at least five 
of said genes is determined. 

1 9. The method of claim 1 6 wherein the gene product is protein. 

20. The method of claim 1 6 wherein the gene product is RNA. 

21. The method of claim 20 wherein expression is determined using at least one 
oligonucleotide probe. 

22. The method of claim 21 wherein expression is determined using at ieast two 
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oligonucleotide probes. 

23. The method of claim 21 wherein the at least one oligonucleotide probe is 
immobilized on a solid support. 

24. The method of claim 23 wherein the at least one oligonucleotide probe is in 
an array. 

25. The method of claim 16 wherein the test cell is selected from the group 
consisting of a colon epithelial cell, a breast epithelial cell, a lung epithelial cell, a 
melanocyte, and a brain cell. 

26. The method of claim 16 wherein the normal cell and the test cell are selected 
from a single cell type. 

27. A method of reducing expression of a cancer-specific gene in a human cell, 
comprising the step of: 

administering to the cell a reagent which specifically binds to an expression 
product of a cancer-specific gene comprising a sequence selected from the group consisting 
of SEQ ID NOS:228, 230-257, 259-260, and 262-265, whereby expression of the cancer- 
specific gene is reduced relative to expression of the cancer-specific gene in the absence 
of the reagent. 

28. The method of claim 27 wherein the reagent is an antisense oligonucleotide. 

29. The method of claim 27 wherein the reagent is an antibody. 

30. A method for comparing expression of a gene in a test sample to expression 
of a gene in a standard sample, comprising the steps of: 

determining a first ratio and a second ratio, wherein the first ratio is an 
amount of an expression product of a test gene in a test sample to an amount of an 
expression product of at least one gene comprising a sequence selected from the group 
consisting of SEQ ID NOS:266-375, 377-652, 654-796, and 798-1448 in the test sample, 
and wherein the second ratio is an amount of an expression product of the test gene in a 
standard sample to an amount of an expression product of the at least one gene in the 
standard sample; and 

comparing the first and second ratios, wherein a difference between the first 
and second ratios indicates a difference in the amount of the expression product of the test 
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gene in the test sample. 

3 1 . The method of claim 30 wherein the at least one gene comprises a sequence 
selected from the group consisting of SEQ ID NOS:282, 288, 300, 302, 308, 320, 323, 
363, 368, 379, 381, 444, 453, 518, 531, 535, 538, 542, 579, 580, 594, 600, 604, 617, 626, 
641, 650, 717, 728, 776, 777, 794, 818, 822, 842, 885, 887, 899, 900, 902, 904, 914, 930, 
960, 964, 1001, 1015, 1020, 1027, 1035, 1090, 1113, 1119, 1146, 1151, 1163, 1233, 1235, 
1252, 1255, 1270, 1340, 1345, 1356, 1359, 1360, 1362, 1385, 1415, and 1441. 

32. The method of claim 30 wherein expression is determined using at least one 
oligonucleotide probe. 

33. The method of claim 32 wherein the at least one oligonucleotide probe is 
immobilized on a solid support. 

34. The method of claim 33 wherein the at least one oligonucleotide probe is in 
an array. 

35. The method of claim 30 wherein the test sample is a cancer cell and the 
standard sample is a normal cell. 

36. The method of claim 35 wherein the cancer cell is selected from the group 
consisting of a colon cancer cell, a breast cancer cell, a lung cancer cell, a melanoma cell, 
and a brain cancer cell. 

37. The method of claim 30 wherein the test sample has been treated with a test 
compound and the standard sample has not been treated with the test compound. 

38. The method of claim 37 wherein the test sample is a cancer cell and wherein 
the standard sample is a normal cell. 

39. The method of claim 30 wherein the test sample and the standard sample are 
obtained from the same cell type. 

40. A method of screening candidate anti-cancer drugs, comprising the steps of: 
contacting a cancer cell with a test compound; and 

measuring expression in the cancer cell of a gene product of at least one gene 
comprising a sequence selected from the group consisting of SEQ ID NOS: 228, 230-257, 
259, 260, 262-263, and 265, wherein a decrease in expression of the gene product in the 
presence of a test compound relative to expression of the gene product in the absence of the 
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test compound identifies the test compound as a potential anti-cancer drug. 

41. The method of claim 40 wherein the cancer cell is selected from the group 
consisting of a colon cancer cell, a breast cancer cell, a lung cancer cell, a melanoma cell, 
and a brain cancer cell. 

42. The method of claim 40 in which expression of gene products of at least two 
of said genes is measured. 

43. The method of claim 40 in which expression of gene products of at least five 
of said genes is measured. 

44. The method of claim 40 wherein the gene product is protein. 

45. The method of claim 40 wherein the gene product is RNA. 

46. The method of claim 45 wherein expression of the at least one gene product 
is measured using at least one oligonucleotide probe. 

47. The method of claim 46 wherein the at least one oligonucleotide probe is 
immobilized on a solid support. 

48. The method of claim 47 wherein the at least one oligonucleotide probe is in 
an array. 

49. The method of claim 46 wherein the at least one oligonucleotide probe 
comprises a sequence selected from the group consisting of SEQ ID NOS:237, 239, 257, 
and 263. 

50. A method of screening test compounds for the ability to increase an organ or 
cell function, comprising the step of: 

contacting a cell selected from the group consisting of a colon epithelial cell, 
a brain cell, a keratinocyte, a breast epithelial cell, a lung epithelial cell, a melanocyte, a 
prostate cell, and a kidney cell with a test compound; and 

measuring expression in the cell of a gene product of at least one gene 
comprising a sequence selected from at least one of the following groups: 



107-110, 112-129, 131-150, and 151; 

(c) the sequences shown in SEQ ID NOS:152-154, and 155; 
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(a) the sequences shown in SEQ ID NOS:2, 5-18, 20-84, and 85; 

(b) the sequences shown in SEQ ID NOS:87-96, 98, 100-103, 105, 
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(d) the sequences shown in SEQ ID NOS:156-159 and 160; 

(e) the sequences shown in SEQ ID NOS: 161-166 and 167; 

(f) the sequences shown in SEQDDNOS:168, 170, 172-177, 179-188, 



190-207, and 208; 



(g) the sequences shown in SEQ ID NOS:209 and 210; and 

(h) the sequences shown in SEQ ID NOS:21 1-224 and 225, 
wherein an increase in expression of a gene product of at least one gene comprising 

a sequence selected from (a) identifies the test compound as a potential drug for increasing 
a function of a colon cell; 

wherein an increase in expression of a gene product of at least one gene comprising 
a sequence selected from (b) identifies the test compound as a potential drug for increasing 
a function of a brain cell; 

wherein an increase in expression of a gene product of at least one gene comprising 
a sequence selected from (c) identifies the test compound as a potential drug for increasing 
a function of a skin cell; 

wherein an increase in expression of a gene product of at least one gene comprising 
a sequence selected from (d) identifies the test compound as a potential drug for increasing 
a function of a breast cell; 

wherein an increase in expression of a gene product of at least one gene comprising 
a sequence selected from (e) identifies the test compound as a potential drug for increasing 
a function of a lung cell; 

wherein an increase in expression of a gene product of at least one gene comprising 
a sequence selected from (f) identifies the test compound as a potential drug for increasing 
a function of a melanocyte; 

wherein an increase in expression of a gene product of at least one gene comprising 
a sequence selected from (g) identifies the test compound as a potential drug for increasing 
a function of a prostate cell; and 

wherein an increase in expression of a gene product of at least one gene comprising 
a sequence selected from (h) identifies the test compound as a potential drug for increasing 
a function of a kidney cell 
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5 1 . The method of claim 50 wherein expression of gene products of at least two 
of said genes is determined. 

52. The method of claim 50 wherein expression of gene products of at least five 
of said genes is determined. 

53. The method of claim 50 wherein the gene product is protein. 

54. The method of claim 50 wherein the gene product is KNA. 

55. The method of claim 54 wherein expression is determined using at least one 
oligonucleotide probe. 

56. The method of claim 54 wherein expression is determined using at least two 
oligonucleotide probes. 

57. The method of claim 55 wherein the at least one oligonucleotide probe is 
immobilized on a solid support. 

58. The method of claim 57 wherein the at least one oligonucleotide probe is in 
an array. 

59. A method to restore function to a diseased tissue or cell comprising the step 

of: 

delivering a gene to a diseased cell selected from the group consisting of a 
colon epithelial cell, a brain cell, a keratinocyte, a breast epithelial cell, a lung epithelial 
cell, a melanocyte, a prostate ceil, and a kidney cell, wherein the gene comprises a 
nucleotide sequence selected from at least one of the following groups : 

(a) the sequences shown in SEQ ID NOS:2, 5-18, 20-84, and 85; 

(b) the sequences shown in SEQ ID NOS:87-96, 98, 100-103, 105, 
107-110, 112-129, 131-150, and 151; 

(c) the sequences shown in SEQ ID NOS:152-154, and 155; 

(d) the sequences shown in SEQ ID NOS:156-159 and 160; 

(e) the sequences shown in SEQ ID NOS:161-166 and 167; 

(f) the sequences shown in SEQ ID NOS: 1 68, 1 70, 1 72-1 77, 1 79- 1 88, 

190-207, and 208; 

(g) the sequences shown in SEQ ID NOS:209 and 210; and 

(h) the sequences shown in SEQ ID NOS:21 1-224 and 225, 
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wherein expression of the gene in the diseased cell is less than expression of the gene in a 
corresponding cell which is normal, 

wherein if the diseased cell is a colon epithelial cell, then the nucleotide sequence is 
selected from (a); 

wherein if the diseased cell is a brain cell, then the nucleotide sequence is selected 
from (b); 

wherein if the diseased cell is a keratinocyte, then the nucleotide sequence is selected 
from (c); 

wherein if the diseased cell is a breast epithelial cell, then the nucleotide sequence 
is selected from (d); 

wherein if the diseased cell is a lung epithelial cell, then the nucleotide sequence is 
selected from (e); 

wherein if the diseased cell is a melanocyte, then the nucleotide sequence is selected 
from (f); 

wherein if the diseased cell is a prostate cell, then the nucleotide sequence is selected 
from (g); and 

wherein if the diseased cell is a kidney cell, then the nucleotide sequence is selected 
from (h). 

60. The method of claim 59 wherein the diseased cell fails to express the gene in 
the diseased state. 
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